Skip to content

feat: CPAN smoke test, HTML::Parser XS backend, and MakeMaker deferred installation#412

Merged
fglock merged 14 commits into
masterfrom
feature/io-file-new-tmpfile
Mar 31, 2026
Merged

feat: CPAN smoke test, HTML::Parser XS backend, and MakeMaker deferred installation#412
fglock merged 14 commits into
masterfrom
feature/io-file-new-tmpfile

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Mar 31, 2026

Summary

MakeMaker deferred installation (key change)

  • Defer file installation to make phase: Previously WriteMakefile() installed .pm files immediately during Makefile.PL, before CPAN.pm could detect and install dependencies. Now it writes MYMETA.yml + a Makefile with real cp commands, and files are only installed when make runs — after CPAN.pm has resolved all dependencies.
  • This enables proper recursive dependency resolution for any CPAN module (tested with 3-level dep chains)

HTML::Parser/HTML::Entities Java XS backend

  • HTMLParser.java: Single Java file implementing both HTML::Parser (basic event-driven parsing) and HTML::Entities (full decode support including numeric, named, surrogate pairs)
  • HTML::Parser tests: 190/415 (from 0)

Other fixes

  • IO::Socket::INET: Fix exists(&Errno::EINVAL) (unsupported dynamic pattern) → eval { Errno::EINVAL() }
  • Encode::Alias: Support find_encoding() via XSLoader deferred load
  • Regex preprocessor: Fix incorrect rejection of lookaheads and escaped pipes
  • exit(): Let END blocks modify $?; require shows full error messages
  • ExtUtils::MakeMaker exports: Add $VERSION, $Verbose, _sprintf562 to @EXPORT_OK

CPAN smoke test infrastructure

  • New dev/tools/cpan_smoke_test.pl with curated module registry, regression detection, --compare support
  • Parallel execution via --jobs N
  • Module registry with Parse::RecDescent, Spreadsheet::WriteExcel, Image::ExifTool, top-20 CPAN modules

Smoke test results (post changes)

  • New PASS: HTTP::Message, Devel::Cover, Spreadsheet::ParseExcel (1612/1612), IO::Stringy (127/127)
  • Near-PASS: Test::Deep (1266/1268), Moo (809/840), Log::Log4perl (715/719)

Test plan

  • make passes (all unit tests)
  • MakeMaker deferred install verified: Data::CompareFile::Find::RuleNumber::Compare + Text::Glob all auto-resolved
  • perl -c dev/tools/cpan_smoke_test.pl — syntax OK
  • Smoke test modules install and run

Generated with Devin

@fglock fglock changed the title feat: CPAN smoke test improvements and MakeMaker exports feat: CPAN smoke test, HTML::Parser XS backend, and MakeMaker deferred installation Mar 31, 2026
fglock and others added 14 commits March 31, 2026 21:40
…_sprintf562)

Devel::Cover Makefile.PL imports $VERSION from ExtUtils::MakeMaker,
which was not in @EXPORT_OK. Also adds $Verbose variable and _sprintf562
positional sprintf variant used by ExtUtils::MM_Any and other internal
modules.

- Add $VERSION, $Verbose, _sprintf562 to @EXPORT_OK
- Add our $Verbose = 0 declaration
- Add _sprintf562() subroutine (positional %1$s format)
- Add dev/modules/devel_cover.md fix plan

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Adds a tool to run jcpan -t on a curated registry of CPAN modules and
report installation/test status, XS detection, and regressions.

Features:
- Registry of 27 modules with category (known-good/partial/blocked)
- XS status tracking (pure-perl/java-xs/xs-with-pp-fallback/xs-required)
- Parses Test::Harness output for pass/fail counts
- Isolates target module results from dependency test output
- Regression detection via --compare with previous .dat files
- --quick mode for known-good regression checks

Usage: perl dev/tools/cpan_smoke_test.pl [--quick|--list] [Module...]

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Fork-based job pool allows running multiple jcpan -t processes
concurrently. Each child writes results to a temp file; parent
collects and reports as they finish.

Default remains sequential (--jobs 1) for safety since parallel
jcpan runs share ~/.perlonjava/lib/.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
… to smoke test

All three are pure-Perl modules. Parse::RecDescent depends on the
bundled Text::Balanced. Spreadsheet::WriteExcel would unlock the
ParseExcel skipped t/46_save_parser.t. Image::ExifTool already
passes 590/600 tests via its dedicated runner.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add 9 modules from CPAN top-20 (by favorites + reverse deps):
- Partial: JSON, Type::Tiny, List::MoreUtils, Template, Mojolicious
- Blocked: Plack, LWP::UserAgent, DBIx::Class, DBI

Create dev/modules/smoke_test_investigation.md documenting:
- Shared root causes (Clone::PP, MIME::Base64 $VERSION, Encode::Locale,
  PerlIO::encoding, exit codes)
- Per-module failure analysis for all 39 registered modules
- Prioritized fix order by impact

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
P1: Create Clone::PP using Storable::dclone (unblocks HTTP::Message chain)
P2: Set MIME::Base64 $VERSION=3.16 in both .pm and Java backend
P4: Create PerlIO::encoding stub (helps IO::HTML and encoding-aware modules)

Additional fixes:
- Add $VERSION to bundled JSON.pm (fixes JSON CONFIG_FAIL in smoke test)
- Create Template::Stash::XS shim inheriting from Template::Stash
  (pure Perl fallback; Template still blocked by P6 regex bug)

Update investigation plan with P6 (regex engine \| alternation bug)
and progress tracking for completed fixes.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ipes

Two bugs fixed in RegexPreprocessor.java:

1. Quantifier validation used raw StringBuilder inspection to detect
   "quantifier follows nothing" after |. This failed for escaped \|
   (literal pipe) which also ends with | in the output buffer.
   Replaced with the existing lastWasQuantifiable flag which already
   correctly distinguishes alternation | (sets false) from escape
   sequences like \| (sets true).

2. Lookahead (?=), lookbehind (?<=, (?<!), negative lookahead (?!),
   and atomic groups (?>) were routed through handleRegularParentheses
   which only appended '(' and started recursive parsing at the '?'.
   The recursive handleRegex then treated '?' as a quantifier, causing
   "Quantifier follows nothing" errors. Fixed by appending the full
   group opener (e.g., "(?=") and starting recursive parsing after it.

   This also fixes incorrect capture group counting - these non-capturing
   constructs were being counted as capturing groups.

Patterns that now work:
- qr/\|\|?/             (escaped pipe with quantifier)
- /(a)(?=b)/            (lookahead after capture)
- /(.*)(?:::|')(?=.)/   (constant.pm pattern used by Template Toolkit)
- /(?<=a)b/             (lookbehind)
- /(?>a+)b/             (atomic group)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
… load

Previously Encode was pre-loaded at startup (GlobalContext.initialize),
which set %INC and prevented Encode.pm from executing.  This meant any
Perl-level wrapper code in Encode.pm was dead.

Changes:
- Defer Encode initialization to XSLoader::load (like TimeHiRes,
  UnicodeNormalize).  Encode.pm now runs normally when `use Encode`
  is called.
- Set Encode constructor to setInc=false so %INC is managed by
  require, not the Java module.
- Add a Perl-level find_encoding wrapper in Encode.pm that falls back
  to Encode::Alias::find_alias when the Java charset lookup fails.
  This enables coderef/regex/string aliases registered by modules
  like Encode::Locale (e.g. "locale" -> "UTF-8").
- Add resolve_alias() implementation.
- Per-name recursion guard prevents circular alias chains.

Unblocks: Encode::Locale, HTTP::Message chain (partially)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Two bugs fixed:

1. exit() ignored $? modifications by END blocks (Perl 5 perlvar
   semantics). Now sets $? to the exit code before running END blocks,
   calls runEndBlocks(false) to preserve it, and reads $? back as the
   final exit code. This fixes Test::Needs (200/227 -> 227/227) where
   Test2's END block needs to override exit(0) with a failure code.

2. require error messages were missing "Compilation failed in require".
   Now builds the full Perl 5-compatible error (original + newline +
   "Compilation failed in require") and sets $@ before throwing, so
   eval{} sees the complete message.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Implements HTMLParser.java providing:
- Full HTML::Entities decode_entities() and _decode_entities() with
  numeric (decimal/hex), named entity, surrogate pair, and prefix
  expansion support - ported from util.c
- UNICODE_SUPPORT() and _probably_utf8_chunk() stubs
- HTML::Parser construction (_alloc_pstate), 13 boolean accessors,
  handler registration, and basic event-driven HTML parsing
- Cross-package registration matching original Parser.xs layout

Unblocks: HTTP::Message (now PASS), Devel::Cover (now PASS),
HTML::Parser 190/415 tests passing.

Also comments out Mojolicious/Moose/Plack from smoke tests (timeout).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
All priority fixes P1-P7 now marked as DONE:
- P3: Encode::Alias find_encoding wrapper
- P5: exit() END block $? handling
- P6: Regex preprocessor lookaheads/escaped pipes
- P7: HTML::Parser Java XS backend Phase 1

Updated module status: HTTP::Message PASS, Devel::Cover PASS,
HTML::Parser 190/415, Test::Needs 227/227 PASS.
Added latest smoke test results table.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
… resolution

Previously, WriteMakefile() installed .pm files immediately during
Makefile.PL execution, before CPAN.pm could detect and install missing
dependencies. This meant modules were installed in a broken state when
deps were missing.

Now WriteMakefile() only writes MYMETA.yml (for dep detection) and a
Makefile with real cp commands. Actual file installation happens when
CPAN.pm runs 'make', after it has resolved and installed all
dependencies from MYMETA.yml.

Flow: Makefile.PL → MYMETA.yml written → CPAN detects deps → installs
deps → runs 'make' → files installed.

Also fixes IO::Socket::INET loading by replacing exists(&Errno::EINVAL)
(unsupported dynamic pattern) with eval { Errno::EINVAL() }.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…lone

Storable::dclone does not preserve code references - it turns them into
broken strings where ref() returns empty and calling them fails with
Undefined subroutine errors.

This broke DateTime (via Specio which uses Clone::PP to clone attribute
definitions containing inline_generator coderefs).

The new implementation handles hashes, arrays, scalar refs, and circular
references. Code refs, globs, and regexps are returned as-is (shared)
since they are immutable.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Two fixes in RegexPreprocessor.handleRegex():

1. When a quantifier follows a non-quantifiable item and the previous
   character in the output buffer is itself a quantifier (*, +, ?, }),
   emit "Nested quantifiers" instead of "Quantifier follows nothing".
   The lastWasQuantifiable check was short-circuiting before the nested
   quantifier detection could run, producing the wrong error message
   for patterns like a**, .{1}??, .{1}?+.

2. When stripping \G from the beginning of a group, also strip any
   following quantifier (?, *, +). Since \G is removed for Java regex
   compilation, its quantifier would be left dangling. For example,
   (\G?[ac])? is now correctly preprocessed to ([ac])?.

Fixes re/regexp.t (-4 across 6 variant files) and re/reg_mesg.t (-2)
regressions introduced in fb776af.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock force-pushed the feature/io-file-new-tmpfile branch from 816ab70 to 6c322a9 Compare March 31, 2026 19:41
@fglock fglock merged commit 1d21379 into master Mar 31, 2026
2 checks passed
@fglock fglock deleted the feature/io-file-new-tmpfile branch March 31, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant